[Epistemic Status: This is an artifact of my self study. I am using it to remember links and help manage my focus. As such, I don't expect anyone to fully read it. If you have particular interest or expertise, skip to the relevant sections, and please leave a comment, even just to say "good work/good luck". I'm hoping for a feeling of accountability and would like input from peers and mentors. This may also help to serve as a guide for others who wish to study in a similar way to me. ]
List of acronyms: Mechanistic Interpretability (MI), AI Alignment (AIA), Outcome Influencing System (OIS), n-Dimensional Scatter Plot (NDSP), Vannessa Kosoy's Learning Theoretic Agenda (VK LTA), Machine Learning (ML), Large Language Model (LLM),
My goals for this sprint were:
So how did I do?
Tu, July 8 | Spent about 4 or 5 hours writing SSJ #2 and then started the document for SSJ #3. About 2 hours of that time was spent writing the section on Neel's MI guide transcribing from my handwritten notes. The other 2 hours was split between everything else. |
Wd, July 9 | No progress. Woke early to go jogging, but didn't get enough sleep so ended up tired and distracted and eventually napped instead of working on this. |
Th, July 10 | SSJ--2. Spent about an hour reading VK LTA while on the bus. |
Fr, July 11 | SSJ--2. Spent about 2 hours reading VK LTA. |
Sa, July 12 | No progress. Went for a hike :-) |
Su, July 13 | No progress. |
Mo, July 14 | No progress. |
Tu, July 15 | No progress. |
Wd, July 16 | No progress. |
Th, July 17 | No progress. |
Fr, July 18 | SSJ--1. About 3 hours researching and thinking about definition of a "system" in the context of OIS. I think I have a grasp on the idea I want to describe now, but just need to figure out how to write it down. |
Sa, July 19 | No progress. |
Su, July 20 | No progress. |
Mo, July 21 | SSJ--1. Worked on definition of "outcome", "influence", and "system" while on bus ride home from lecture. |
Tu, July 22 | SSJ--3. Spent 3 or 4 hours starting to draft an explanation of my research interests to reference while asking math profs at my university for help honing my math study plan. |
Well, I'm glad I am now including a daily worklog. It is embarrassing that I failed to get any work done so many days, and I do not wish to repeat this during the next sprint, but as the Litany of Gendlin says, "What is true is already so. Owning up to it doesn't make it worse." and another good one, the Litany of Tarski, "If I haven't been managing my time well, I desire to believe that I haven't been managing my time well." Or, a personal saying of my own, "The first step to influencing a variable is being able to read it's current value".
How did I do with each of my goals?
I did get some work done on this. I referenced definitions in other fields, but ended up using them to inform my thinking on the OIS definition. I think it makes more sense to get that fairly fleshed out before actually writing about other fields since the goal is to describe a mapping from the terminology of each field into OIS terminology. So it's still useful to study other fields, but not to start writing sections on them yet.
Still, I think it would be good to focus on something else for the next sprint. The OIS document is going to take me a good amount of time to complete.
I think next sprint I will switch to writing a literature review of AIA glossaries and terminology. This will be good in itself, and will help me verify my intuition that current AIA terminology is a mess and that we need a new paradigm such as OIS. Alternatively, if I disprove that intuition, I will save myself a lot of wasted effort!
I spent a good amount of time reading this, but not in a context where I was taking notes on it as I read, which I think is a mistake. For future reading I'm going to prioritize only reading when I can be active about it, not treating it like something I can passively do on my phone.
The thoughts I do have on VK's LTA are:
Also, a career advisor in an EA thread recommended I read Shallow Review of Technical AI Safety 2024, so I'm setting that as next sprint's reading. I will continue VK LTA some other time.
Didn't spend any time studying math, but I did start writing an email to send to math professors and immediately ended up yak shaving, writing a description of my current research directions and what math I am aware of relating to them. Oh well, that's probably a good thing to do anyway, so I've added it as a SSJ-1, writing task, for the next sprint.
Did not start this 😥 Adding it unchanged to the next sprint.
Did not start this 😥 Adding it unchanged to the next sprint.
In addition to my 5 focuses, I'm adding a 6th! I realize a lot of the work I'm wanting to do is getting feedback from people on things and networking, so I'm making that more explicit, giving it it's own category going forward.
Additionally, I want to put a focus on making things that are "feedback friendly". What do I mean by this?
I want to keep some focus on idea of "feedback ready work" going forward. Critiquing other agendas, pointing out things I think are flaws and how my work fit's int the context of those flaws seems like a valuable strategy. I shouldn't just be just be reading agenda's I agree with, but also one's I disagree with.
The Goals: